Probabilistic Entity Linkage for Heterogeneous Information Spaces
نویسندگان
چکیده
Heterogeneous information spaces are typically created by merging data from a variety of different applications and information sources. These sources often use different identifiers for data that describe the same real-word entity (for example an artist, a conference, an organization). In this paper we propose a new probabilistic Entity Linkage algorithm for identifying and linking data that refer to the same real-world entity. Our approach focuses on managing entity linkage information in heterogeneous information spaces using probabilistic methods. We use a Bayesian network to model evidences which support the possible object matches along with the interdependencies between them. This enables us to flexibly update the network when new information becomes available, and to cope with the different requirements imposed by applications build on top of information spaces.
منابع مشابه
Entity linkage for heterogeneous, uncertain, and volatile data
A plethora of collections is nowadays created by merging data from a variety of different applications and information sources. These sources often use different identifiers for data that describe the same real world object, for example an artist, a conference, an organization. The large number of existing entity linkage approaches are not designed for the characteristics of modern applications...
متن کاملProbabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملTotally probabilistic Lp spaces
In this paper, we introduce the notion of probabilistic valued measures as a generalization of non-negative measures and construct the corresponding Lp spaces, for distributions p > "0. It is alsoshown that if the distribution p satises p "1 then, as in the classical case, these spaces are completeprobabilistic normed spaces.
متن کاملA COMMON FRAMEWORK FOR LATTICE-VALUED, PROBABILISTIC AND APPROACH UNIFORM (CONVERGENCE) SPACES
We develop a general framework for various lattice-valued, probabilistic and approach uniform convergence spaces. To this end, we use the concept of $s$-stratified $LM$-filter, where $L$ and $M$ are suitable frames. A stratified $LMN$-uniform convergence tower is then a family of structures indexed by a quantale $N$. For different choices of $L,M$ and $N$ we obtain the lattice-valued, probabili...
متن کاملOn-the-Fly Entity-Aware Query Processing in the Presence of Linkage
Entity linkage is central to almost every data integration and data cleaning scenario. Traditional techniques use some computed similarity among data structure to perform merges and then answer queries on the merged data. We describe a novel framework for entity linkage with uncertainty. Instead of using the linkage information to merge structures a-priori, possible linkages are stored alongsid...
متن کامل